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TITLE OF THE INVENTION 
ARITHMETIC METHOD AND APPARATUS AND CRYPTO PROCESSING 
APPARATUS 

BACKGROUND OF THE INVENTION 
5 The present invention relates to an arithmetic 

method and apparatus and a crypto processing apparatus 
and, more particularly, to an arithmetic method and 
apparatus and a crypto processing apparatus which are 
suitably used for crypto coprocessors and the like 
10 implemented in, for example, IC cards and information 

electric home appliances. 

in implementing an LSI for public-key cryptography, 
a cryptosystem for performing an integer based operation 
of the RSA (Rivest-Shamir-Adleman) system or the like 
15 have been mainly used. In this system, an operation 

must be performed for an integer with a large number 
of digits. For this reason, if this system is applied 
to an IC card or the like, a special-purpose processor 
is required. Many systems that implement such 
20 special-purpose coprocessors ao, realize long integer 

put into practice. 

Recently, attention has been given to cryptosystems 
based on an algebraic system called a finite field 
25 GF(2-m); Galois Field, especially elliptic curve 

cryptosystems of a finite field GF(2*m), instead of 
integer based cryptosystems. 
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in this cryptosystem using a finite field GF(2~m) 
arithmetic operation, the number of bits to be handled 
must be set to be as large as 160 or more as in an 
integer based operation system such as RSA. For this 
reason, if such a system is implemented on a device in 
which the performance of a CPU is low, e.g., an IC card, 
a relatively long processing time is required. 
Therefore, there are demands for an increase in the 
performance by using special-purpose hardware 

(coprocessors) . 

As described above, according to RSA as well as 
elliptic curve cryptography, special-purpose 
coprocessors must be prepared to realize high-speed 
crypto processing in IC cards and the like. 

FIG. 23 shows the layout of an IC card LSI 
including a coprocessor for crypto processing. 
Referring to FIG. 23, in this LSI, a CPU, RAM, ROM, and 
EEPROM are integrated into one chip, and the coprocessor 
is comprised of a RAM, arithmetic section, and control 
section. The coprocessor assists the CPU in performing 



cryptography, e.g., a long exponentiation and the four 
fundamental operations of arithmetic under the control 
of the CPU. 

FIG. 24 shows a coprocessor in the LSI shown in 
FIG. 23. In RSA, this component is implemented as an 
integer based multiplier for performing integer based 
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operations . 

In assembling an LSI of an elliptic curve 
cryptography, although the overall arrangement becomes 
identical or similar to that of the LSI shown in FIG. 23, 
5 a coprocessor for performing finite field GF(2~m) 

arithmetic operations must be prepared instead of a 
coprocessor for performing integer based operations. 

FIG. 25 is a block diagram showing the hardware 
arrangement of a coprocessor for performing finite field 
10 GF(2~m) arithmetic operations with a polynomial base. 

FIG. 25 shows a kind of arithmetic apparatus for a 
finite field GF(2*m) called a cyclotomic field using the 
special irreducible polynomial disclosed in "Hardware 
Implementation of Elliptic Curve Crypto syst em " , SCIS' 
15 98-10. 1. C. This arithmetic apparatus has an 

arrangement capable of executing addition, square, 
multiply, and inverse operations on a finite field 
GF(2 A m). With this arrangement, a finite field GF(2 /V m) 
arithmetic operation required to compute a point on an 
20 elliptic curve is executed. By integrating such an 

arithmetic apparatus into an IC, a coprocessor foTr ~~ 
finite field GF(2*m) arithmetic operations which can be 
applied to the LSI in FIG. 23 can be obtained. 

in this case, each of adder and multiplier circuits 
25 is constituted by m EX-ORs, and a multiplier circuit 81 

is implemented by the circuit arrangement shown in 
FIG. 26. 
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FIG. 26 shows a finite field GF(2~m) based 
multiplier circuit called a cyclotomic field. 

The multiplier circuit 81 has m-bit input registers 
A and B. The multiplier circuit 81 inputs the 
coefficients of a polynomial a(x) as fixed values to the 
input register A and computes while shifting the 
coefficients of a polynomial b(x) from the most 
significant bit in response to respective clocks. 
Referring to FIG. 26, reference symbols D denote 
flip-flops constituting a feedback register. When m 
shifts are made, the values of the respective blocks D 
are loaded into an output register C, thus obtaining 
a(x)*b(x) as an operation result. 

As is obvious from the comparison between the 
circuits shown in FIGS. 24 and 26, an integer based 
multiply operation and finite field GF(2~m) arithmetic 
operation of a polynomial base totally differ in their 
architectures for executing multiply operations. 
Attempts have therefore been made to form different 
hardware arrangements for the respective cryptosystems . 

For a finite field GF(2 A m) BaseanKo^SXar — — 
multiplication in a fundamental operation for an 
elliptic curve cryptosystem, an arithmetic apparatus 
using a linear feedback shift register (LFSR) as a 
25 divide circuit using a polynomial f(x) on a finite of 

field GF(q m ) is widely used. The modulo polynomial f(x) 
is : 
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f (x)=f m x" l +f n ,_ix m - 1 +-+flx+fo f fm- 1 
FIG. 27 is a block diagram showing the arrangement 
of a linear feedback shift register LFSR. In this LFSR 
90, EX-OR adders 9^ to 91 m and 1-clock delay elements 
(to be referred to as registers hereinafter) 92 x to 92 m 
are alternately cascaded from the input side. In this 
arrangement, the output extracted from the mth register 
92 m is separately fed back to the m adders 91 x to 91 m 
through coefficient units 93i to 93 m . 

This LFSR 90 operates on a unit time (clock) basis, 
in the shift register, advancing an operation clock 
pulse by one clock is referred to as making a shift, and 
a number m of registers 92 x to 92 m incorporated in the 
shift register is referred to as the number of stages of 
15 the shift register. 

When q = 2, a 1-bit flip-flop can be applied to 
each of the registers 92 i to 92 m . Each of the 
coefficient units 93i to 9 3 m multiplies "1" or "0". 
When 1 is multiplied, a corresponding coefficient unit 
20 is connected, whereas when 0 is multiplied, a 

each of the adders 91! to 91 m , a 2-input EX-OR is used. 

in this LFSR 90, as the coefficients of a dividend 
polynomial are sequentially input from the input side 
25 (left side) from the higher orders, the coefficients of 

a quotient polynomial are sequentially output from the 
output side (right side) from higher orders. In this 
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case, the contents of the respective registers 
(flip-flops) 92 X to 92 m upon completion of input of the 
Oth-order term of the dividend polynomial are the 
coefficients of a remainder polynomial. 

in the arithmetic apparatus using the above LFSR 90, 
however, the registers 92 1 to 92 m equal in number to the 
bits of a degree m are required, and hence the 
arrangement of the registers 92 x to 92 m is limited by 
the degree m. If, therefore, the degree m increases, 
the LFSR must be modified for each arithmetic apparatus . 

Although attention is currently given to elliptic 
curve cryptosystems, RSA cryptosystems are still in the 
mainstream. It is therefore strongly required that even 
IC cards using elliptic curve cryptosystems comply with 

RAS cryptosystems . 

When both a conventional integer based cryptosystem 
and a finite field GF(2*m) based cryptosystem are to be 
incorporated in the same IC card, coprocessors 
corresponding to the respective cryptosystems must be 
incorporated in the IC card according to the conven- 

incorporated in the IC card, the chip area of the IC 
card, which is severely limited in terms of area, is 
undesirably reduced. 

In a finite field GF(2~m) based modular 
multiplication, as the degree m increases, the LFSR must 
be modified for each arithmetic apparatus, thus imposing. 
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limitations in terms of hardware. 

BRIEF SUMMARY OF THE INVENTION 
It is an object of the present invention to provide 
an arithmetic method and apparatus and a crypto 
processing apparatus which can execute arithmetic 
operations without modifying the apparatus 
configurations even if a degree m of a finite field 

GF ( 2 A ro) increases. 

It is another object of the present invention to 
provide an arithmetic apparatus and crypto processing 
apparatus which can execute a finite field GF(2*m) 
arithmetic operation as well as an integer based 
operation by only adding minimum architectures. 

According to the present invention, there is 
provided an arithmetic apparatus which operates a unit 
arithmetic circuit while propagating a carry in an 
integer based unit arithmetic operation, and operating 
the unit arithmetic circuit without propagating any 
carry in a finite field GF(2*m) based unit arithmetic 
20 operation. 

— - According to the present invenHonT^fl^~fl'el-d- 

GF(2"m) arithmetic operation can be executed as well an 
integer based operation by only adding a minimum 

architecture. 

According to the present invention, there is 
provided an arithmetic apparatus comprising an integer 
based unit arithmetic circuit, a finite field GF(2"m) 
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based unit arithmetic circuit logically adjacent to the 
integer based unit arithmetic circuit, and a selector 
for selecting the integer based unit arithmetic circuit 
or the finite field GF(2~m) based unit arithmetic 
circuit . 

According to the present invention, both an integer 
based unit multiply operation and a finite field GF(2*m) 
based multiply operation can be executed by only adding 
a finite field GF(2~m) based unit arithmetic circuit. 

According to the present invention, the arithmetic 
apparatus comprises an integer based unit arithmetic 
circuit and a selection control circuit which outputs, 
to the integer based unit arithmetic circuit, and a 
selection signal for selecting an integer based unit 
arithmetic operation or a finite field GF(2~m) based 
unit arithmetic operation. In addition, the integer 
based unit arithmetic circuit comprises a carry 
propagation control circuit which, in executing a long 
product-sum operation, propagates a carry upon reception 
of a selection signal instructing an integer based unit 
arithmetic operation, and propagates no carry upW~ 
reception of a selection signal instructing a finite 
field GF(2-m) based unit arithmetic operation. In this 
apparatus, the integer based arithmetic mode and finite 
field GF(2*m) based arithmetic mode can be switched by 
controlling carry propagation in the unit arithmetic 
circuit. 
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According to the present invention, both an integer 
based arithmetic operation and finite field GF(2"») 
based arithmetic operation can be executed by only 
adding the carry propagation control circuit. 
5 According to the present invention, there is 

provided an arithmetic apparatus comprising a carry 
propagation control circuit which performs carry 
propagation control in a full adder in units of bits by 
using a switch to which a selection signal and carry out 

10 signal are input. 

in the arithmetic apparatus of the present 
invention, the carry propagation control circuit 
comprises a selector which switches between outputting 
an EX-OR result of two inputs in a full adder in units 

15 of bits as an addition result and outputting an EX-OR 

result of t he result c and an input car ry, as an addition 
result . 

According to the present invention, there is 
provided an arithmetic apparatus comprising an adder 
20 circuit for adding by propagating a carry when executing 

propagating any carry when executing a finite field 
GF(2 /> m) based multiply operation. 

According to the present invention, both an integer 
25 based multiply operation and finite field GF(2*m) based 

arithmetic operation can be reliably executed with 
respect to an addition portion of a product-sum 
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operation. 

According to the present invention, there is 
provided a crypto processing apparatus capable of 
switching between encryption or decryption based on an 
5 integer based operation performed by an arithmetic 

apparatus and encryption or decryption based on a finite 
field GF(2"m) based arithmetic operation performed by 
the arithmetic apparatus . 

The present invention can perform both crypto 

□ 

C 2 10 processing based on an integer based operation such as 

03 an RSA crypto operation and crypto processing based on a 

JZ finite field GF(2~m) based arithmetic operation such as 

y an elliptic curve crypto operation. 

JU According to the present invention , there is 

:7 15 provided an arithmetic apparatus comprising an 

J{ arithmetic section including a long product-sum 

O operation circuit capable of executing a modular 

multiplication with a polynomial base expression of a 
finite field GF(2^m) and a control section which 
20 controls the product-sum operation circuit to execute a 

multiply processing and a modulo processing. 

According to the arithmetic apparatus of the 
present invention, since the long product-sum operation 
25 circuit performs a modulo instead of a linear feedback 

shift register, an arbitrary degree equal to or larger 
than 1 can be used. Even if, therefore, the degree of a 



finite field GF(2 A m) increases, an arithmetic operation 
can be executed without modifying the apparatus 
configuration . 

According to the present invention, the product-sum 
operation circuit comprises a single precision 
multiplier circuit for multiplying polynomial data of a 
finite field GF(2 A m) based polynomial base without 
propagating any carry, and a double precision adder 
circuit for adding by using the multiply result obtained 
by the multiplier circuit, and the control unit controls 
the multiplier circuit and adder circuit in multiply 
processing. 

According to the present invention, there is 
provided an arithmetic apparatus comprising a quotient 
acquisition circuit which is controlled by the control 
unit, sets the multiply result of two polynomial data as 
first dividend polynomial data in a modulo, sets 
predetermined modulo polynomial data as divisor 
polynomial data, performs quotient calculation on the 
basis of the first or subsequent dividend polynomial 

^SCa^aia^Tvxso^^ 
quotient polynomial data with the number of bits 
corresponding to a bus width from an upper order. In 
this arithmetic apparatus, the control unit controls the 
quotient acquisition circuit in a modulo, and controls 
the multiplier circuit and adder circuit when 1-block 
quotient polynomial data is acquired. With this 
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operation, next dividend polynomial data is calculated 
by subtracting the multiply result of the 1-block 
quotient polynomial data and divisor polynomial data 
from the current dividend polynomial data, and the 
processing, from controlling the quotient acquisition 
circuit to calculating dividend polynomial data, is 
repeated, thereby obtaining residue data. 

In this arithmetic apparatus, every multiply result 
of 1-block quotient polynomial data and divisor 
polynomial data becomes (m + 1) blocks. 

In addition, this multiply result is subtracted (= 
added) from the current dividend polynomial to calculate 
the next dividend polynomial data of (2m - l*n) blocks 
(n is the number of times of multiply operations). That 
15 is, the previous dividend polynomial data is decreased 

in units of blocks . 

With the above control unit, the present invention 
can realize efficient modulo and quotient calculation by 
utilizing the characteristics of hardware. 
20 in a quotient calculation, the quotient acquisition 

invention multiplies the inverse data of the upper two 
blocks of divisor polynomial data and the current 
dividend polynomial data, and sets the second upper 
25 block of the multiply result as 1-block quotient 

polynomial data. 

With the above quotient acquisition circuit, the 
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present invention can extract an effective number 
portion from the obtained quotient polynomial, and hence 
can optimize operation precision. 

According to the present invention, there is 
provided an arithmetic apparatus comprising a quotient 
acquisition circuit for calculating inverse data from 
the upper two blocks of divisor polynomial data and 
storing the data in a memory when acquiring quotient 
polynomial data in a first operation, and reading out 
the inverse data from the memory and using it when 
acquiring quotient polynomial data in a second or 
subsequent operation. 

According to the present invention, with the above 
quotient acquisition circuit, when redundant modulo is 
executed under the same modulo polynomial, a quotient 
can be acquired by reading out inverse data from the 
memory. Therefore, the time required to calculate 
inverse data can be saved in the second and subsequent 
quotient calculations, and the processing time for a 
finite field GF(2"m) arithmetic operation can be 
^KoFteneH^ 



calculated in advance, a finite field GF(2 A m) based 
modular multiplication can be realized by using only the 
product-sum operation circuit for performing multiply 
and addition operations. 

According to the present invention, there is 
provided an arithmetic apparatus comprising a quotient 
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acquisition circuit for, in calculating inverse data, 
counting the number of consecutive Os from high-order 
bits of the upper two blocks of divisor polynomial data, 
extracting polynomial data of 1 block + 1 bit from 
5 high-order bits such that the most significant bit is 

set to 1, obtaining the inverse of the extracted 
polynomial data, obtaining 2 -block data as a whole by 
concatenating corrected data whose least significant bit 
is 1 and other bits are 0 to the most significant bit of 
10 the obtained inverse, and setting, as inverse data, a 

result obtained by bit-shifting the data to the 
high-order side by the count of 0s. 

With the above quotient acquisition circuit, the 
present invention uses a corrected value as inverse data 
H is to avoid normalization of a divisor, correction of an 

yg approximate quotient, and denormalization of an 

o 

O operation result such as a quotient or residue based on 

the Knuth algorithm (reference: Knuth, D. E., "The Art 
of Computer Programming", Vol.2, Reading, Mass.: Addison 
20 Wesley, 2nd edition, (1981)) using a single precision 

divide operation. The number of times of bit shifts can 
therefore be decreased, and the arithmetic apparatus can 
be optimized. 

25 According to the present invention, there is 

provided a crypto processing apparatus for encrypting or 
decrypting based on a finite field GF(2"m) based modular 
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multiplication by the arithmetic apparatus. 

With the arithmetic apparatus, the present 
invention can encrypt or decrypt based on a finite field 
GF(2"m) based modular multiplication such as an elliptic 
5 curve crypto operation. 

Additional objects and advantages of the invention 
will be set forth in the description which follows, and 
in part will be obvious from the description, or may be 
learned by practice of the invention. The objects and 
tfS 10 advantages of the invention may be realized and obtained 

K by means of the instrumentalities and combinations 

4~; particularly pointed out hereinafter. 

y BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 

The accompanying drawings, which are incorporated 
15 in and constitute a part of the specification, 

illustrate presently preferred embodiments of the 
invention, and together with the general description 
given above and the detailed description of the 
preferred embodiments given below, serve to explain the 
20 principles of the invention. 

arrangement of an arithmetic apparatus according to the 
first embodiment of the present invention; 

FIGS. 2A, 2B, and 2C are views showing an example 
25 of the arrangement of a 4*4-bit unit arithmetic circuit 

which implements c'(x) = a(x)*b(x); 

FIGS. 3A, 3B, 3C and 3D are views showing an 
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example of the arrangement of a 4*4-bit unit arithmetic 
circuit which implements an integer based multiply 
operation; 

FIG. 4 is a block diagram showing an example of the 
arrangement of a 4-bit ripple carry type full adder with 
a carry control function which is used in the 
coprocessor of the first embodiment; 

FIG. 5 is a circuit diagram showing an example of 
the arrangement of a full adder and carry control switch 
which are used in an adder circuit in the first 
embodiment ; 

FIG. 6 is a circuit diagram showing a modification 
of the full adder with the carry control function; 

FIG. 7 is a circuit diagram showing another 
modification of the full adder with the carry control 
function; 

FIG. 8 is a block diagram showing an example of the 
arrangement of an arithmetic apparatus according to the 
second embodiment of the present invention; 

FIGS. 9A and 9B are views showing an example of the 
artahgem^ circuit which 

implements a multiplier circuit in the second 
embodiment ; 

FIG. 10 is a block diagram showing an example of 
the arrangement of a coprocessor applied to an 
arithmetic apparatus and crypto processing apparatus 
according to the third embodiment of the present 
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invention; 

FIG. 11 is a schematic view showing the arrangement 
of a quotient acquisition circuit in the third 
embodiment ; 

FIG. 12 is a schematic view for explaining the 
function of an inverse calculator section in the third 
embodiment; 

FIG. 13 is a schematic view showing the arrangement 
of the inverse calculator section in the third 
embodiment ; 

FIG. 14 is a flow chart for explaining a modular 
multiplication for a finite field GF(2*m) based 
polynomial base in the third embodiment; 

FIG. 15 is a schematic view showing calculation on 
paper to explain a modulo in the third embodiment; 

FIG. 16 is a schematic view showing the processing 
performed by an arithmetic unit in the third embodiment; 

FIG. 17 is a view showing the required numbers of 
clocks for commands in the third embodiment; 

FIG. 18 is a view showing the required numbers of 
clocks for GF (2 160 ) operations in the third embodiment;. 

FIG. 19 is a view showing the circuit sizes of 
coprocessors in the third embodiment; 

FIG. 20 is a view showing additional circuit sizes 
in the third embodiment; 

FIG. 21 is a view showing the circuit sizes of 
coprocessors designed specifically for GF (2 m ) 
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operations for the sake of comparison in the third 
embodiment ; 

FIG. 22 is a block diagram showing an example of 
the arrangement of a coprocessor applied to an 
arithmetic apparatus and crypto processing apparatus 
according to the fourth embodiment of the present 
invention; 

FIG. 2 3 is a block diagram showing an IC card LSI 
including a crypto processing coprocessor; 

FIG. 24 is a block diagram showing an example of 
the arrangement of a coprocessor portion of an LSI for 
performing an integer based operation; 

FIG. 25 is a block diagram showing an example of 
the hardware arrangement of a coprocessor for performing 
a finite field GF(2~m) arithmetic operation of a 

polynomial base; 

FIG. 26 is a block diagram showing a finite field 
GF(2-m) based multiplier circuit called a cyclotomic 
field; and 

FIG. 27 is a block diagram showing the arrangement 
' of a general linear feedBatnrWfra^ 

DETAILED DESCRIPTION OF THE INVENTION 
Each embodiment of the present invention will be 
described below with reference to the views of the 
accompanying drawing. 
(First Embodiment) 

FIG. 1 is a block diagram showing an example of the 
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arrangement of an arithmetic apparatus according to the 
first embodiment of the present invention. 

An arithmetic apparatus of this embodiment which is 
formed as a coprocessor 1 is a long product-sum 
multiplier apparatus capable of both an integer based 
multiply operation and a finite field GF(2"m) based 
multiply operation. This apparatus executes other 
operations such as addition, square, and inverse 
operations by controlling this multiply processing. By 
incorporating this arithmetic apparatus in an LSI or the 
like, a crypto processing apparatus capable of realizing 
both an RSA cryptosystem and elliptic curve cryptosystem 
is formed. In this case, for example, the LSI in which 
the arithmetic apparatus is to be incorporated is the 
apparatus shown in FIG. 23. 

In this coprocessor 1, an arithmetic unit 4 is 
controlled by a controller unit 5 to input/output data, 
through a 32-bit data bus 3, to/ from a memory 2 for 
storing data in the process of an operation. 

input data from the data bus 3 is stored in buffers 



stored in a buffer 17R. 

input data X and Y are multiplicand/multiplier data. 
Of these data, the data Y is input to a buffer as data 
divided in units of predetermined digits to prevent a 
multiply operation of many digits from being performed 
at once. Data Z is an interim result which is produced 
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because a multiply operation is executed in a plurality 
of steps. This data is added to the product of XY, and 
overflow called a carry C is added to the sum, thus 
completing one cycle. Data R obtained by removing the 
carry from the resultant data is output to the data bus 
3 through the buffer R to be used as the data Z for an 
operation in the next cycle. By repeating this cycle a 
plurality of number of times, a long integer multiply 
operation or finite field GF(2^m) based multiply 
operation ("C" to be described later in a strict sense) 

is performed. 

To realize the above operation, in addition to the 
buffers 17X, 17Y, 17Z, and 17R, the coprocessor 1 
includes an integer based multiplier circuit 11, a 
15 finite field GF(2~m) based multiplier circuit 12, a 

selector 13, an adder circuit 14, an adder circuit 15, a 
carry holder 16, and the controller unit 5. 

The integer based multiplier circuit 11 performs an 
integer based multiply operation for the data X in a 
buffer 17X and the data Y in a buffer 17Y, and outputs 
the result to €fie^T^1To*r~I3T-~- ™— ™ 
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The finite field GF(2*m) based multiplier circuit 
12 executes part (C) of a finite field GF(2~m) based 
multiply operation by using the data X in the buffer 17X 
and the data Y in the buffer 17Y, and outputs the result 

to the selector 13. 

The selector 13 outputs the data output from the 
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integer based multiplier circuit 11 or finite field 
GF(2*m) based multiplier circuit 12 to the adder circuit 
14 in accordance with a signal SI from the controller 
unit 5 . 

The adder circuit 14 is a full adder, which adds 
the data Z in the buffer 17Z to the selector output and 
outputs the sum to the adder circuit 15. In this adder 
circuit 14, integer based addition and finite field 
GF(2~m) based addition are switched in accordance with 
the control signal SI. This addition switching will be 

described later. 

The adder circuit 15 adds the carry C held in the 
carry holder 16 to the output from the adder circuit 14. 
The adder circuit 15 then outputs the upper 32 bits of 
the sum as the next carry C to the carry holder 16, and 
also outputs, to a buffer 17R, the lower 8 bits as the 
data R which is the operation result in this cycle. In 
the adder circuit 15 as well, integer based addition and 
finite field GF(2~m) based addition are switched in 
accordance with the control signal SI. 

The carry holder i15~KoT.W^hW-caxry-e"Ott^u%-^m 

the adder circuit 15, and supplies the held carry C to 
the adder circuit 15 in the next operation cycle. 

The controller unit 5 comprises an integer 
arithmetic controller 21 and finite field GF(2~m) 
arithmetic controller 22. The controller unit 5 
controls the arithmetic unit 4 in accordance with one of 
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these command groups. This command switching is 
performed in accordance with a command from an external 
CPU (e.g., the CPU in FIG. 23). 

The integer arithmetic controller 21 controls the 
arithmetic unit 4 to make it operate as a long integer 
based multiplier. For this purpose, the control signal 
SI controls the selector 13 to output the data from the 
integer based multiplier circuit 11 to the adder circuit 
14, and also controls the adder circuits 14 and 15 to 
make them operate as an integer based adder circuit. 
The integer arithmetic controller 21 executes other 
arithmetic processes such as the four fundamental 
operations of arithmetic by controlling the operation of 
the arithmetic unit 4 as an integer based multiplier. 

The finite field GF(2*m) arithmetic controller 22 
controls the arithmetic unit 4 to operate as a finite 
field GF(2-m) based multiplier. For this purpose, the 
control signal SI controls the selector 13 to output the 
data output from the integer based multiplier circuit 11 
to the adder circuit 14, and also controls the adder 



field GF(2~m) based adder circuit. In addition, the 
finite field GF(2~m) arithmetic controller 22 realizes 
addition and square operations by controlling the 
operation of the arithmetic unit 4 as a finite field 
GF ( 2 A ra) based multiplier. 

in order to realize the respective processes 
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described above, the controller unit 5 controls the 
respective sections by outputting a control signal S2. 

The operation of the arithmetic apparatus according 
to this embodiment having the above arrangement will be 

described next. 

in this arithmetic apparatus (coprocessor 1), the 
multiplier circuit 12, selector 13, and the like are 
incorporated in the integer based multiplier apparatus 
to realize the processing to be performed by a finite 
field GF(2*m) based multiplier apparatus. In this case, 
according to a finite field GF(2~m), an (m-1) -order 
polynomial can be expressed using an m-bit vector by: 

a(x)=a m _ 1 x*- 1 +a m _ 2 x in - 2 +"'+a 1 x+a 0 --(1) 

=[ a m-l'""' a l' a 0] 
b(x)=b m _ixm-l + b m .2X m - 2 +-+b 1 x + bo -(2) 

=[ b m-l' "' b l' b 0l 
in this case, a finite field GF(2*m) based multiply 

operation is a modular multiplication with an m-order 

irreducible polynomial f(x) on GF(2™) being set as a 

modulus. in addition, a product c(x) of two unknowns 

a(x) and b(x) of the extension of field of 2 is defined 



as : 



c(x)=a(x) -b(x)mod f(x) "'( 3 ) 

= 2a k -x k -b(x) mod f(x) 

=cm-l xm " 1+c m-2 xm " 2+ +c l x+c 0 

=[ c m-l'"' c l' c 0] 
In addition, a modulo polynomial f(x) can be 
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expressed as: 

f (x)=f m x m +f m -ix m - 1 + •• +fix+f 0 "( 4 ) 

=[fm' f m-l'"' f l' f O] 
In a general finite field GF(2 /V m) based polynomial 

multiply operation, as shown in FIG. 26, a shift 

register based on multiplier cycle shift operation is 

formed, and a residue polynomial after m cycle shifts is 

set as a multiply result. In this embodiment, however, 

this processing is performed by slightly modifying a 

long product-sum operation circuit widely used in an 

integer based crypto processing LSI. 

Note that when the coprocessor 1 operates as an 
integer based arithmetic apparatus in accordance with 
the control signal SI from the controller unit 5, this 
arithmetic apparatus functions as a long product-sum 
operation circuit. In this long product-sum operation 
circuit, the finite field GF(2~m) based multiplier 
circuit 12 calculates equation (5) as part of a finite 
field GF(2 A m) based multiply operation upon switching 
based on the control signal SI. 

c' (x)=a(x) -b(x) —..(5) 



Note that the finite field GF(2~m) based multiplier 
circuit 12 does not calculate the portion "c(x)'mod 
f(x)" in equation (6) in the step of calculating C. 
That is, C itself is computed in the same manner as the 
product of two numbers in an integer based multiply 
operation by only switching the multiplier circuit 12 
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and adder circuits 14 and 15 using the control signal SI. 

Note that the m-bit multiplier and multiplicand of 
c'(x) = a(x) -b(x) are divided into 32-bit data and read 
out from the memory, and the operation result is written 
in the memory in units of 32 bits. The final operation 
result becomes 2m-bit data. 

The integer based operation performed by the 
integer based multiplier circuit 11 differs from the 
finite field GF(2~m) based polynomial operation 
performed by the finite field GF(2*m) based multiplier 
CO circuit 12 in the presence/absence of a carry. In the 

4" integer based operation, a logic expression of addition 

fn 

uj is : 

p 0+0+Carry(=0)=(hCarry=0 '"( 6 ) 

H 15 l+0+Carry(=0)=l>Carry=0 

p;{ 1+1+Carry ( =0 ) =0,Carry=l 

O In this manner, the operation must consider a carry from 

a lower bit. In contrast to this, in a finite field 
GF(2"m) based algebraic system, since each bit indicates 
the coefficient of each term of a polynomial, no 

~^consici^Htrio^ 
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different order. 

In consideration of this, in this embodiment, each 
integer based arithmetic unit (multiplier or adder) is 
25 switched between the normal mode of allowing carry 

propagation and the mode of executing no carry 
propagation. In this case, the mode of inhibiting (not 



executing) carry propagation is used to perform finite 
field GF(2"m) arithmetic operation. The size of a 
circuit to be added to switch the carry propagation 
modes is small as compared with the total circuit size. 

FIGS. 2A, 2B, and 2C show an example of the 
arrangement of a 4*4-bit unit arithmetic circuit that 
implements c'(x) = a(x) • b(x) . 

The finite field GF(2*m) based multiplier circuit 
12 in FIG. 1 is obtained by forming the unit operation 
device shown in FIG. 2A into a 8*32-bit arrangement. 
Note that the circuit shown in FIG. 2B corresponds to an 
input section 29 of the circuit in FIG. 2A. 

FIGS. 3A, 3B, 3C, and 3D show an example of the 
arrangement of a 4*4-bit unit arithmetic circuit that 
implements an integer based multiply operation. 

The integer based multiplier circuit 11 shown in 
FIG. 1 is obtained by forming the unit arithmetic 
apparatus in FIGS. 3A to 3D into a 8*32-bit arrangement. 
FIG. 3C shows the arrangement of a full adder FA used in 
FIG. 3A. FIG. 3D shows the arrangement of a carry 31 of 

section 30 of the circuit in FIG. 3A. 

in the arithmetic apparatus of this embodiment, the 
finite field GF(2^m) based multiplier circuit 12 and 
integer based multiplier circuit 11 are logically 
adjacent to each other, and these circuits 11 and 12 are 
selected in accordance with the control signal SI 
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generated from a finite field GF(2~m) arithmetic 
operation command from the controller unit 5, thereby 
performing appropriate processing. 

An output from the selector 13 is input to the 
5 adder circuit 14. In this case, the Z + (T*X) adder 

circuit 14 is a full adder for adding 40-bit data (Y*X) 
and 8-bit data Z. In this case as well, finite field 
GF(2-m) based addition is realized by adding a switch 
for inhibiting a carry of the result obtained by adding 
10 the respective bits from being propagated to the next 
stage in accordance with the above control signal. 

FIG. 4 is a block diagram showing an example of the 
arrangement of a 4-bit ripple carry type full adder 
having a carry control function which is used in the 
15 coprocessor in this embodiment. 

The adder circuit 14 shown in FIG. 1 is obtained by 
extending the full adder having this arrangement into a 
circuit capable of adding 40-bit data and 8-bit data. 
In the circuit shown in FIG. 4, switches 33 are 
20 arranged between full adders 32 to control carry 

propagation. 

FIG. 5 shows an example of the arrangement of a 
full adder and carry control switch which are used in 
the adder circuit in this embodiment. 
25 The full adder 32 and switch 33 constitute a full 

adder 42 having a carry control function for one bit. 
in this case, the full adder 32 has the same arrangement 
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as that of the full adder FA in FIG. 3C, and the carry 
31 in the full adder 32 has the same arrangement as that 

of the carry in FIG. 3D. 

The switch 33 connected to a carry propagation line 
in the full adder 32 is controlled by the control signal 
SI from the controller unit 5. When an integer based 
operation is to be performed, the switch 33 is connected. 
When a finite field GF(2~m) arithmetic operation is to 
be performed, the switch 33 is disconnected. 

The output (Z + (Y*X)) from the adder circuit 14 
having the above arrangement is propagated to the adder 
circuit 15 . 

The C + Z + (Y*X) adder circuit 15 on the last 
stage of the arithmetic operation block outputs the 
lower 8 bits of the 40 bits as the multiply result as 
the data R, and adds the upper 32 bits to Z + (Y*X) in 

the next cycle. 

Similar to the adder circuit 14, the adder circuit 
15 is a full adder having a carry control function shown 
in FIG. 4 which is controlled by the control signal SI. 

adder circuit 15 serves as a full adder adjusted to the 
LSB to execute integer based addition. In the finite 
field GF(2~m) arithmetic operation mode, the adder 
circuit 15 executes finite field GF(2*m) based addition. 

The output data R from the adder circuit 15 is 
temporarily stored in the memory 2 through the data bus 



3. This data becomes the data Z again and returns to 
the coprocessor 1, and an integer based multiply 
operation or finite field GF(2"m) based multiply 
operation is continued. This operation is repeated by 
the number of times corresponding to the required number 
of cycles, thereby obtaining a multiply result. 

In this case, the result of equation (5) can be 
obtained in accordance with a finite field GF(2"m) based 
multiply command, and a finite field GF(2"m) based 
multiply operation is completed by a modular multi- 
plication with the irreducible polynomial f (x) as a 
modulus, as defined by equation (6). Similar to 
division on paper, the modular multiplication may be 
performed by repeating the processing of acquiring a 
quotient from the upper digits of a dividend and 
subtracting the current dividend from the product of the 
current quotient and a divisor (in an extension field of 
2, subtraction and addition are performed in the same 
manner) by the number of times corresponding to the 
required number of cycles. This processing can be 
r eal^irzed^y^xec ut ing^a- f irrrte ~f ieid^GF-(-2^m^baBed~-— ~ 
multiply command and addition command (this operation 
will be described in detail in the third embodiment). A 
finite field GF(2~m) based square operation can be 
realized by the same processing as that for a multiply 
operation. An inverse operation can be realized by 
mutually repeating multiply and square operations . 
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A case wherein the arithmetic unit 4 functions as a 
finite field GF(2 A m) based adder in accordance with a 
finite field GF(2"m) based addition command will be 
described below. 
5 Similar to general polynomial addition, finite 

field GF(2~m) based addition is performed by adding the 
coefficients of the same order as per 

c(x)=a(x)+b(x) "'(7) 

= [ a m- l +b m- 1 1 a m-2 + b m- 2 r " * a 0+ b 0 1 
10 In this case, the sum of the coefficients of 

the respective orders is 0+0=1+1=0 and 

0+1=1+0=1, and hence, no carry is produced 

bj unlike in integer based addition. Therefore, finite 

'a 

□ field GF(2 A m) based addition can be generally 

U 15 implemented by m EX-ORs . 

J? In an integer based multiplier apparatus, addition 

can be handled as c = b + a*l. In this embodiment, 
therefore, finite field GF(2^m) based addition is also 
executed as c(x) = b(x) + a(x)*l by using this algorithm 
20 without any change. This arithmetic operation can be 

because the full adders shown in FIG. 4 are used for the 
adder circuits 14 and 15. 

In addition, with switching operation using the 
25 control signal SI, the coprocessor 1 becomes a circuit 

having the same function as that of the coprocessor 
shown in FIG. 24, thus realizing an integer based 



operation as well . 

As described above, in the arithmetic apparatus 
according to this embodiment of the present invention, 
the integer based multiplier device includes the unit 
multiplier device for an integer based multiply 
operation and the unit arithmetic device for a finite 
field GF(2"m) based multiply operation, which has a 
circuit arrangement similar to that of the unit 
multiplier apparatus, and a finite field GF(2 /N m) 
arithmetic operation command is added to an integer 
based multiply command. In addition, this apparatus 
includes the selector controlled by a control signal 
generated from a finite field GF(2 /N m) arithmetic command 
and the switch for controlling the propagate of a carry 
out of each bit of the full adder. The arithmetic 
apparatus of the present invention can therefore execute 
both integer based operation and finite field GF(2~m) 
arithmetic operation without using any sequential finite 
field GF(2^m) based multiplier device using a conven- 
tional shift register. 

„^_™A,45ublic -key^e:rypto— proees sing— aeeelera^or-eapabtbe== 
of executing finite field GF(2 /S m) based addition and 
multiply operations using a long product-sum operation 
circuit can therefore be provided by adding small 
numbers of instructions and circuits as additional 
extension functions to a conventional integer based 
arithmetic unit. Note that the circuit size required to 
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realize this embodiment is small as compared with the 
total circuit size. 

AeGoxdin< f-^rn^Te- j ig!.LypLu pmut^ia L mij T^pnmi-iig L^fe 
this embodiment, an/LSI having abundant functions 
capable of handing the finite field GF(2"m) based 
elliptic curve/cryptosystem as well as the integer based 
RAS system can be provided as a crypto processing 
coprocessor without specifically increasing the packing 
area. An encryption/decryption apparatus capable of 
handling both RAS and elliptic curve cryptosystem can be 
implemented in even an apparatus having a small packing 
roa , — such as an - IC ea^d* 

Full adders which have carry control functions and 
constitute the adder circuits 14 and 15 shown in FIG. 4 
will be described. 

FIG. 6 shows another full adder having a carry 
control function. 

Like the circuit in FIG. 5, this full adder 43 
having the carry control function comprises a switch 33 
and full adder 32. In the circuit in FIG. 6, however, 
the_switch r: 33 is^pro.vided^on, the - input s 



31 unlike in the circuit in FIG. 5, in which the switch 
33 is provided on the output side of the carry 31. 

FIG. 7 shows still another full adder having a 
carry control function. 

This full adder 44 having the carry control 
function performs carry control by controlling selection 
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of an addition result as an output. More specifically, 
a switch 33' is a selector, which selects an output from 
an EX-OR 35 or EX-OR 36 on the basis of the control 
signal SI. A ripple carry type full adder obtained by 
connecting such full adders can control carry 
propagation in accordance with the control signal SI. 

Assume that the control signal SI in FIG. 7 is a 
control signal based on a finite field GF(2~m) 
arithmetic operation command. In this case, if the 
signal SI is "1", outputs a and b from the EX-OR 35 
become operation results. As a consequence, the full 
adder 44 functions as a finite field GF(2~m) based adder. 
If the signal SI is "0", an output from the full adder 
44 becomes an operation result. As a consequence, the 
full adder 44 functions as an integer based adder. 
(Second Embodiment) 

FIG. 8 shows an example of the arrangement of an 
arithmetic apparatus according to the second embodiment 
of the present invention. The same reference numerals 
as in FIG. 1 denote the same parts in FIG. 8, and a 

~d71^TpTi3TrTh^ 

portions will be described below. Note that a 
repetitive description will be avoided in each 
embodiment described below. 

A coprocessor 1' as this arithmetic apparatus has 
the same arrangement as that in the first embodiment 
except that it has a multiplier circuit 41 in place of 
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the integer based multiplier circuits 11, finite field 
GF(2-m) based multiplier circuit 12, and selector 13 in 
FIG. 1. 

This multiplier circuit 41 is designed to switch 
the inter based multiply mode and the finite field 
GF(2^m) based multiply mode (only C in equation (6)) in 
accordance with a control signal SI from a controller 
unit 5 . 

FIGS. 9A and 9B show an example of the arrangement 
of a 4*4-bit unit arithmetic circuit for realizing the 
multiplier circuit of this embodiment. In practice, the 
multiplier circuit 41 is realized by forming the unit 
arithmetic device shown in FIGS. 9A and 9B into a device 
having an 8*32-bit configuration. The circuit in 
15 FIG. 9B shows an input section 29 of the circuit in 

FIG. 9A. 

As shown in FIG. 9A, the multiplier circuit 41 uses 
the full adder 42 having the carry control function in 
FIG. 5 as a full adder, and hence can control carry 
20 propagation in accordance with the control signal SI. 

field GF(2~m) based multiply mode can be realized by a 
finite field GF(2~m) arithmetic operation command. 

The arithmetic apparatus of this embodiment can 
25 therefore operate in a manner similar to the first 

embodiment . 

As described above, the arithmetic apparatus and 



crypto processing apparatus according to this embodiment 
of the present invention uses the multiplier circuit 41 
in place of the integer based multiplier circuit 11 and 
finite field GF(2~m) based multiplier circuit 12, and 
selector 13, and implements the functions of the 
circuits 11, 12, and 13 by using one circuit 41. In 
addition to effects similar to those of the first 
embodiment, this embodiment can switch between the 
integer based multiply operation and finite field 
GF ( 2 A m) based multiply operation by using fewer 
additional circuits. 

.In this em hnrliTp r Xnl -, t hn full nrt rt o r nh r wn -An FIG. Z< 
is used as the fyfa adder 42 having the carry control 
function. Hjaw^ver, the full adder 43 or 44 having a 
carry co^rol function shown in FIG. 6 or 7 may be used 
in s^ad of the full adder 42 having a carry control 

(Third Embodiment) 

FIG. 10 is a block diagram showing an example of 
the arrangement of a coprocessor applied to an 
-SHtffiiSFi"cnapparl^^ 

according to the third embodiment of the present 
invention. 

■T his cmbodimo nl i Ti n rnnrj fttfi-o simp le n f fhn mo da4e- 
section of the first embodiment. As shown in FIG. 10, a 
controller unit 5 includ^ a finite field GF ( 2 *m) 
arithmetic controlle/22a having a modulo function added 
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to the above function, and a quotient acquisition 
circuit^HT<nich is controlled by the modulo function 

in add4^ f m= to the dbuy^ -4ua^ton-ef- 
iontrolling an arithmetic unit 4 to ob^n a multiply 
result c'(x) of equation (5), the finite field GF(2~m) 
arithmetic controller 22a has th/function of 
controlling the arithmetic uni/ 4 and quotient 
acquisition circuit 50 to execute a modulo for this 
10 multiply result c'(x) usin/ a modulo polynomial f(x). 
More specifically, the control function includes the 
function of input tino/output ting data to/from a memory 2 
and buffers 17X, lfi, 17Z, and 17R on the basis of the 
operation algorithm to be described later, and the 
15 function of generating various commands such as a 

multiply command, addition command, and inverse 
operation command and supplying them to corresponding 
arithmetic circuits in accordance with the input/output 
(erat±©»- 

20 The quotient acquisition circuit 50 is used to 

c'(x) by the modulo polynomial f(x) as part of a modulo. 
In this case, the quotient acquisition circuit 50 has 
the function of obtaining the above quotient by 
25 multiplying an inverse J3 (x) of the modulo polynomial 

f(x) and the dividend polynomial c'(x). 




37 - 



50 is controlled by the finite field/GF ( 2 *m) arithmetic 
controller 22a, and has the functi/n of supplying the 
upper two blocks (F L _i (x) , ^-ife)) °* tne modul ° 
polynomial f(x) in the memorV^ to the inverse 
calculator section 51 in oJy one time of the modulo and 
making the section 51 cal/ulate the inverse 0 (x) of the 
upper two blocks, the function of reading out the 
obtained inverse 0 (tff from the memory 2 when the 
inverse is written /n the memory 2, the function of 
obtaining a quotient y (x) by multiplying the readout 
inverse j3 (x) avid the upper two blocks (C'l-I(x)' c 'l- 
2{x) ) of the ^current dividend polynomial, the function 
of setting /he obtained quotient y (x) as a quotient 
qi(x) of /he upper two blocks and writing the quotient 
qi(x) i/the memory 2, and the function of repeating the 
operation from reading out the inverse 0 (x) to writing 
the^quotient qi(x) until a residue c(x) is obtained, as 

i howr in Fid Hr . 

As shown in FIG. 13, the inverse calculator section 
51 has the function of calculating the inverse & (x) of 

polynomial f(x) in the memory 2 upon reception of the 
two blocks (F L . 1(X) , F L _2(x)> from the q uotient 
acquisition circuit 50 as shown in FIG. 12, and the 
function of writing the obtained inverse 0 (x) in the 
memory 2. The LFSR shown in FIG. 27 is used as a divide 
circuit in part of the inverse calculator section 51. 
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In this case, the inverse 0 (x) has a fixed number 
of bits, and is not a simple inverse but is corrected in 
advance as shown in FIG. 13 to eliminate the necessity 
to normalize a divisor and denormalize an operation 
result in the subsequent main modular multiplication, 
in addition, the inverse 0 (x) itself may be calculated 
by the arithmetic unit 4 instead of the quotient 
acquisition circuit 50 including the inverse calculator 
section 51. 

If, for example, the bus width of an integer based 
product-sum operation circuit is as small as 8 bits, the 
inverse calculator section 51 may be replaced with a 
scheme of storing the inverses of all the 8 bit values 
as a table in a ROM or the like. If, however, the bus 
width is 16 bits or more, the inverse calculator section 
51 is more preferable than the scheme of storing the 
inverses of all 16 bit values in a ROM in consideration ^ 
of a reduction in cost. 

The operation of the arithmetic apparatus 
(coprocessor) having the above arrangement will be 
descFibed next. 



In a modular multiplication for a finite field 
GF(2~m) based polynomial base according to the present 
invention, a multiply operation and modulo are 
25 separately performed. More specifically, as shown in 

FIG. 14, polynomials a(x) and b(x) as multiplicand/ 
multiplier and a modulo polynomial f(x) are input, as 



39 - 



shown in FIG. 14 (step STl), and a multiply operation of 
a(x) • b(x) is performed to obtain a multiply result 
C' (x) having a double bit length (step ST2) . A modulo 
of C' (x)mo d £(x) is then performed (step ST3) to obtain 
a residue c(x) (step ST4) . 

The multiply operation in step ST2 is performed in 
the same manner as in the first and second embodiments. 
The modulo in steps ST3 and ST4 will be described below. 
Calculation on paper will be described first, and an 
actual process corresponding to calculation on paper 
will then be described. 

As indicted by calculation on paper in FIG. 15, a 
modulo for equation (6) is performed after the divisor 
f(x) and dividend c'(x) are divided into unit blocks 
each consisting of a predetermined number k of bits. 
Note that, for example, the number of bits of each unit 
block may be set in correspondence with the bus width of 

the coprocessor 1. 

An upper block c'L-i(x) of the dividend c'(x) is 
divided by f(x), and a quotient qi(x) of one block is 

acquired from the up^r^^«^ An ~°P era1: * ia,r " 0 " f ' 

c , (x) _ f( X ) -qi(x) is then performed to subtract the 
dividend c'(x) of one block from the upper digit. 

More specifically, every time the quotient qi(x) of 
one block is multiplied by the divisor polynomial f(x), 
(m + 1) blocks are obtained as a multiply result. This 
multiply result is subtracted (= added) from the current 
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dividend polynomial c'(x) to calculate the next dividend 
polynomial of (2m - l*n) blocks (n is the number of 
times of multiply operations). That is, the previous 
dividend c'(x) is decreased in units of blocks. 
5 A modulo is completed by repeating this processing, 

from acquiring a quotient to subtracting the quotient, n 
times (= the number of bits of a dividend/the number of 
bits of each unit block) and obtaining the residue c(x). 
Actual processing for a modulo will be described 

10 next . 

in the above modulo, the quotient acquisition 
circuit 50 acquires the quotient qi(x) as shown in 
FIG. 11, and the arithmetic unit 4 decreases the 
dividend c'(x) by calculating c'(x) - f(x) • qi(x) as 
15 shown in FIG. 16. The operations of the quotient 

acquisition circuit 50 and arithmetic unit 4 will be 
sequentially described below. 

in calculating the first quotient, the quotient 
acquisition circuit 50 reads out the upper two blocks 
20 (F L _i(x), F L _ 2 (x)) of the divisor f(x) from the memory 2 

and inputs thfiTEoTBraWCT^^ 

calculate the inverse 0 (x) of the divisor f(x), as 
shown in FIGS. 11 and 12. 

As shown in FIG. 13 and equation (8), the inverse 
25 calculator section 51 stores a number d of consecutive 

0s from the most significant bit MSB of the upper one 
block F L _!(x) of the two blocks (F L _i(x), F L _ 2 (x)) given 
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by 

■ • ■ ( 8 ) 

d=count_zero(F L _i(x) ) v / 

where count_zero() is a function of counting the number 
of consecutive Os from the MSB of the value of ( ) . 
5 The inverse calculator section 51 also calculates a 

number h of digits of a left shift (to be described 
later) on the basis of this number d of invalid digits 
by 

•••19) 

h=(d+l)mod k v ' 

10 and stores it. 

As shown in FIG. 13 and equation (10), the inverse 
calculator section 51 calculates an inverse a (x) of the 
upper two blocks (F L _i(*), F L -2<*>> of the diviS ° r f(X) 
using an LFSR 90 by 
15 a(x)=x2k/(F L .i(x)-xk + F L _ 2 (x)) -( 10 ) 

A case in which one block consists of 16 bits (k = 
16) will be described. Assume also that a dividend is 
x 2*16 (= x 2k } whos e most significant bit MSB is "1" and 

other bits are "0" . 

ftf-bef- beLLiii y L h o upp n r tw o b lo^a (F 1j _ 1 {a), 
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F^^yy=^^llxv1i^^ 

FIG. 27, the inverse calculator/section 51 inputs the 
dividend x2> to the shift renter from higher orders 
and repeats a shift in nni/s of clocks 2*16 times, 
thereby obtaining a 32-*lt inverse a (x) . Note that one 
block may consist of/or 32 bits or another arbitrary 
number of bits. /such a case as well, the inverse 
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Subsequently, the inverse calculator section 51 
concatenates "0s» of (k - 1) bits and "1" of one bit to 
the MSB of this inverse a (x) to obtain a 2k-bit value 
a'(x). The inverse calculator section 51 then shifts 
this 2k-bit value a'(x) to the left by the number h of 
digits of the left shift, obtained by equation (9), to 
calculate the corrected inverse 0 (x) : 

0 (x) = a ' (x)-x h '"(H) 
In this case, the corrected inverse 0 (x) is a 
value that satisfies equations (8) to (11) above. The 
inverse 0 (x) is calculated only once with respect to 
the supplied modulo polynomial f(x) and stored in the 
memory 2, and is read out from the memory 2 afterward. 
Even if the dividend changes, the inverse P (x) remains 
the same as long as the modulo polynomial f(x) remains 
the same. For this reason, the inverse B (x) may be 
read out from the memory 2 without calculating any new 
inverse . 

in a quotient calculation, if the inverse 0 (x) is 

(12) to (15) given below. 

As indicated by equation (12) and FIG. 11, the 
quotient acquisition circuit 50 multiplies the upper two 
blocks (C' L _i(x), C' L _ 2 (x)) of a current dividend C'i 
(0 S i ^ n) and the inverse 0 (x) 

V (x) = 0 (x)'(C'L-l(x)-xk+C'L-2(x)) '"("J 
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In addition, as indicated by equation (13), the 
quotient acquisition circuit 50 extracts a digit 
corresponding to a quotient qi(x) of one block, as the 
second upper block, from a result y (x) , as per 

qi(x) = y(x)/x2k -(13) 
and writes it in the memory 2. Thus, the quotient qi(x) 
of one block is obtained. 

As shown in FIG. 16, the arithmetic unit 4 
subtracts a product f (x) • qi(x) of the divisor and the 
quotient from the current dividend c'i(x). 

More specifically, in the arithmetic unit 4, a 
finite field GF(2"m) based multiplier circuit 12 
multiples the modulo polynomial f(x) and the quotient 
qi(x) of the current one block to obtain a multiply 

15 result P(x) : 

P(x)=f (x)-qi(x) ■(14) 
Adder circuits 14 and 15 subtract (= add) this 
multiply result P(x) from the current dividend C'i to 
obtain a next dividend C'i+1 
20 C'i+l=C'i+P(x) •••(15) 

times to finally obtain a modulo result c(x), as shown 
in FIGS. 14 to 16. This residue c(x) (= [c m _i,---f c l' 
c 0 l) corresponds to the final modular multiplication 
25 result c(x) indicated by equation (3). 

With the above processing, the modular 
multiplication result c(x) represented by equation (6) 
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can be calculated from the multiply result C (x) 
represented by equation (5) described in the first or 
second embodiment, thus completing a modular 
multiplication defined by multiply and modulo. 
(Evaluation) 

The processing speeds and circuit sizes of the 
coprocessors 1 of the first to third embodiments, which 
perform modular multiplications in the above manner, 
were evaluated. The evaluation results will be 
sequentially described below. 
(Evaluation of Processing Speed) 

FIG. 17 shows the required numbers of clocks of 
commands in the coprocessor 1 when m (number of bits) = 
160 and m = 1024. When this coprocessor is applied to 
the elliptic curve crypto system, m (number of bits) = 
160 is a typical size. In the case of m = 1024 in 
FIG. 17, since the maximum key length currently regarded 
as a save value in the integer based RSA cryptosystem is 
1,024 bits, the values in FIG. 17 are presented as speed 
estimates in consideration of an expected increase in 

For a comparison between processing speeds, the 
numbers of processing clocks in addition, multiply, and 
square operations of an extension of field GF of 2 
(2160) of 160 bits were evaluated. FIG. 18 shows the 
results. Note that the numbers of clocks in addition, 
square, and multiply operations include the numbers of 



- 45 - 



* 10 



O 



clocks based on a modulo using a modulo polynomial 
unlike the case shown in FIG. 17 for the sake of a 
comparison with the speed of a GF (2^0) operation. 

Each SR ratio as a comparative value is obtained by 
dividing the number of blocks in the coprocessor 1 by 
the number of clocks in a general shift register circuit. 
The smaller this value the higher the processing speed. 
According to these SR ratios, the coprocessor 1 of the 
present invention can execute finite field GF(2~m) 
arithmetic operations, excluding an addition operation, 
at a processing speed equal to or higher than that of 
the general shift register circuit. 
(Evaluation of Circuit Size) 

As shown in FIG. 19, the total circuit size of the 
coprocessor 1 corresponds to about 30k gates. The 
circuit of the coprocessor 1 is formed by adding the 
circuit for processing a finite field GF(2-m) arithmetic 
operation to an integer based coprocessor. 

More specifically, as shown in FIG. 20, in the 
arithmetic unit 4, the carry switching circuit is added 

unit 5, the quotient acquisition circuit 50 is added for 
a divide operation, although it is scarcely required to 
add any circuits for addition, multiply, and square 
25 operations. No RAM (memory 2) and I/F need be added 

because they are shared with the integer based 
coprocessor . 
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The total circuit size of additional circuits is 
about 5k gates. The additional circuit size of 5k gates 
is not very large in the recent LSI technology. That is, 
this size falls within the range in which the 
5 coprocessor 1 of the present invention can be 

satisfactorily used in place of the existing coprocessor. 

For the sake of comparison, the circuit sizes of 
coprocessors designed specifically for finite field 
GF( 2^111) arithmetic operations were estimated when finite 

O 

10 field GF(2^m) arithmetic operation functions (addition, 

03 multiply, and square operations) were realized without 

4* using the coprocessor 1 of the present invention. 

m 

yj FIG. 21 shows the results. 

p As shown in FIG. 21, when m = 160, the circuit size 

Li 15 of the coprocessor designed specifically for finite 

P field GF(2^m) arithmetic operations is 10k gates. When 

m = 1024, this size becomes 16k gates. Obviously, 
therefore, finite field GF(2~m) arithmetic operation 
functions can be realized by the coprocessor 1 of the 
20 present invention with an additional circuit size about 

i 

specifically for finite field GF(2 A m) arithmetic 

t operations is used. 

f 

As described above, according to this embodiment, 
25 in addition to the effects of the first embodiment, the 

following effects can be obtained. Since the long 
\ product-sum operation circuit performs an arithmetic 
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operation in a modulo instead of the linear feedback 
shift register LFSR 90, an arbitrary degree m equal to 
or larger than 1 can be used. Even if the degree m of a 
finite field GF(2 A m) increases, an arithmetic operation 
5 can be executed without modifying the apparatus. In 

addition, the elimination of hardware restrictions due 
to limitations on the degree m allows the apparatus to 
properly cope with an increase in the number of bits of 
a crypto key. 

tfj 10 In addition, since a finite field GF(2"m) based 

OJ modular multiplication is divided into multiply 

processing and modulo (divide) to allow the use of an 
arbitrary modulo polynomial f(x), general versatility 
can be improved. 
15 In a modulo, when the quotient acquisition circuit 

50 calculates a quotient on the basis of the dividend 
polynomial c'(x) and divisor polynomial f(x) to acquire 
a quotient polynomial qi(x) of one block with the number 
of bits corresponding to the bus width from higher 
20 orders, the arithmetic unit 4 calculates the next 

™™==—~— dividend 

result qi(x) • f (x) of the quotient polynomial qi(x) and 
divisor polynomial f(x) from the current dividend 
polynomial c'i(x). 
25 The coprocessor 1 obtains the residue c(x) by 

repeating this processing, from calculating the quotient 
using the quotient acquisition circuit 50 to calculating 
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the dividend polynomial data by the product-sum 
operation using the arithmetic unit 4. This makes it 
possible to realize an efficient modulo and quotient 
calculation by utilizing the characteristics of hardware 
5 In a quotient calculation, the quotient acquisition 

circuit 50 multiplies the inverse data of the upper two 
blocks of divisor polynomial data and the upper two 
blocks of the current dividend polynomial data, and sets 
the second upper block of the multiply result as the 

10 quotient polynomial data of one block. The quotient 

acquisition circuit 50 can extract an effective number 
portion from the obtained quotient polynomial . 
Therefore, the operation precision can be optimized. 

In a quotient calculation, an independent command 

15 is set to calculate the inverse j8 (x) from the upper two 

blocks of the divisor polynomial f(x), and the inverse 
0 (x) is calculated before a finite field GF(2"m) 
arithmetic operation. The obtained inverse J3 (x) is 
stored in the memory 2. In executing a modulo, the 

20 inverse j3 (x) is read out from the memory 2. 



polynomial, a quotient is acquired by reading out 
inverse data from the memory, and hence the time 
required to calculate inverse data can be saved in the 
25 second and subsequent quotient calculations. This can 

shorten the processing time for a finite field GF(2^m) 
based multiply (modular multiplication) and square 



operation. In addition, since the inverse j3 (x) can be 
calculated in advance, a finite field GF(2"m) based 
modular multiplication can be realized by using only the 
product-sum operation circuit for performing multiply 
and addition operations. 

In calculating inverse data, the quotient 
acquisition circuit 50 counts the number of consecutive 
0s from the high-order bits of the upper two blocks of 
divisor polynomial data, and extracts polynomial data of 
1 block + 1 bit from the high-order bits such that the 
most significant bit is set to 1. The quotient 
acquisition circuit 50 obtains the inverse of the 
extracted polynomial data, and concatenates 1-block 
corrected data whose least significant bit is 1 and 
other bits are 0 to the most significant bit of the 
obtained inverse so as to obtain 2 -block data as a whole 
The quotient acquisition circuit 50 then bit-shifts this 
data to the high-order side by an amount corresponding 
to the count of 0s, and sets the resultant data as 
inverse data. 

-■"-~A~c"orre"c ted" value is set as inverse data to avoid 
normalization of a divisor, correction of an approximate 
quotient, and denormalization of operation results such 
as a quotient and residue, which are performed on the 
basis of the Knuth algorithm with a single precision 
divide operation which is used in a general long integer 
based divide operation. This makes it possible to 
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decrease the number of times of bit shifts and optimize 
the arithmetic apparatus . 

In an integer based multiply operation, for example, 
m bits*m bits = 2m bits, so that even if consecutive 0s 
5 are arranged as several upper bits of the 2m bits, the 

number of effective bits is 2m. If a divide operation 
(modulo) is to be performed by using this multiply 
result, since a divide operation cannot be performed by 
using 0, a divisor and dividend must be shifted to the 

10 left to be normalized in advance such that 1 is set at 

the MSB. When the operation is complete after a 
predetermined loop, the operation result (quotient an 
residue) must also be denormalized by being shifted to 
the right by the number of bits by which the divisor and 

15 dividend were shifted to the left. 

In this embodiment, since a divisor (inverse data 
j3(x)) in a quotient operation is corrected to eliminate 
the need of processing before and after such a divide 
loop, the arithmetic apparatus can be optimized. 

20 In this embodiment, since an arithmetic operation 

— — 



the corrected inverse ]3(x), the number of times of bit 
shifts can be decreased, and the processing speed can be 
increased. 

25 Furthermore, an arithmetic apparatus and 

encryption/decryption apparatus can be realized with a 
small additional circuit amount, each of which 
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incorporates an LSI that operates at a processing speed 
equal to or higher than that of a general shifter 
register type finite field GF(2"m) based multiplier 
circuit with a small number of commands and an 
arithmetic system using a long product-sum operation 
circuit, and can execute various cryptosystems based on 
an integer based operation and finite field GF(2"m) 
arithmetic operation. As the cryptosystem using a 
finite field GF(2 A m) arithmetic operation, an elliptic 
curve cryptosystem such as a prime field based elliptic 
curve cryptosystem or polynomial base elliptic curve 
cryptosystem can be used. 

This embodiment has been described as a concrete 
example of the divide process in the first embodiment. 
Even if, this embodiment is practiced as a concrete 
example of the divide process in the second embodiment, 
similar functions and effects can be obtained, 
(Fourth Embodiment) 

FIG. 22 is a schematic view showing an example of 
the arrangement of a coprocessor applied to an 
affthm^ 

according to the fourth embodiment of the present 
invention. 

This embodiment is a modification of each of the 
first to third embodiments and an arithmetic apparatus 
designed specifically for finite field GF ( 2 A m) 
arithmetic operations. More specifically, the integer 



# • 



OS 

Si 

5! 



- 52 - 

based multiplier circuit 11, selector 13, and integer 
arithmetic controller 21 are omitted from the 
arrangement of this apparatus. Since the same 
arithmetic algorithm as that described above is used, 
5 finite field GF(2"m) based multiply processing is 

divided into a multiply operation and modulo, and the 
modulo is executed after the multiply operation. 

With the above arrangement, the same effects as 
those of the first to third embodiments can be obtained 

10 except for the function/effect of an integer based 

operation itself and the function/effect of switching of 
the integer based operation mode and finite field 
GF(2^m) arithmetic operation. In other words, the same 
effects as those associated with finite field GF(2^m) 

15 arithmetic operations in the first to third embodiments 

can be obtained. 

As has been described in detail above, according to 
the present invention, an arithmetic apparatus and 
crypto processing apparatus which can execute a finite 

20 field GF(2"m) arithmetic operation as well as an integer 

architecture. 

In addition, there are provided an arithmetic 
apparatus and crypto processing apparatus which can 
25 execute arithmetic operations without modifying the 

apparatus configurations even if the degree m of a 
finite field GF(2 /S m) increases. 



Additional advantages and modifications will 
readily occur to those skilled in the art. Therefore, 
the invention in its broader aspects is not limited to 
the specific details and representative embodiments 
shown and described herein. Accordingly, various 
modifications may be made without departing from the 
spirit or scope of the general inventive concept as 
defined by the appended claims and their equivalents. 



